Gut microbial subtypes and clinicopathological value for colorectal cancer

Abstract Background Gut bacteria are related to colorectal cancer (CRC) and its clinicopathologic characteristics. Objective To develop gut bacterial subtypes and explore potential microbial targets for CRC. Methods Stool samples from 914 volunteers (376 CRCs, 363 advanced adenomas, and 175 normal controls) were included for 16S rRNA sequencing. Unsupervised learning was used to generate gut microbial subtypes. Gut bacterial community composition and clustering effects were plotted. Differences of gut bacterial abundance were analyzed. Then, the association of CRC‐associated bacteria with subtypes and the association of gut bacteria with clinical information were assessed. The CatBoost models based on gut differential bacteria were constructed to identify the diseases including CRC and advanced adenoma (AA). Results Four gut microbial subtypes (A, B, C, D) were finally obtained via unsupervised learning. The characteristic bacteria of each subtype were Escherichia‐Shigella in subtype A, Streptococcus in subtype B, Blautia in subtype C, and Bacteroides in subtype D. Clinical information (e.g., free fatty acids and total cholesterol) and CRC pathological information (e.g., tumor depth) varied among gut microbial subtypes. Bacilli, Lactobacillales, etc., were positively correlated with subtype B. Positive correlation of Blautia, Lachnospiraceae, etc., with subtype C and negative correlation of Coriobacteriia, Coriobacteriales, etc., with subtype D were found. Finally, the predictive ability of CatBoost models for CRC identification was improved based on gut microbial subtypes. Conclusion Gut microbial subtypes provide characteristic gut bacteria and are expected to contribute to the diagnosis of CRC.


| INTRODUCTION
Colorectal cancer (CRC) is characterized by low 5-year survival rate and high mortality rate, and a concerning shift toward younger age at diagnosis, contributing to a significant global disease burden. 1,2CRC is prone to metastasis, mainly accounting for CRC-related deaths. 3Early-stage CRC can be effectively treated with surgery, but the insidious symptoms often lead to diagnoses at intermediate or advanced stages, causing patients to miss the optimal window for treatment.
2][13] For instance, differential gut bacteria (e.g., Bifidobacterium and norank_f__Oscillospiraceae, Eisenbergiella) are enriched in poorly differentiated CRC patients, which suggests a potential link between the degree of CRC pathological differentiation and gut bacteria. 115][16] Finally, gut bacteria are related to drug efficacy, and they show an important role in the treatment of CRC.Dysregulation in gut bacterial significantly reduced the efficacy of 5-Fluorouracil in homologous transplanted CRC mice, resulting in increased tumor weight and volume.This highlights the vital role of gut bacteria in regulating the host's response to chemotherapeutic drugs. 17urrently, the molecular typing of CRC based on the heterogeneity of CRC molecular characteristics has made rapid progress and greatly promotes the development of individualized precision therapy. 18,191][22] In addition, the CMS classification is successfully applied to demonstrate the heterogeneity of colorectal adenomas.It was discovered that CMS3, the most prevalent type of adenoma (73%), has the lowest risk of progression to CRC. 23 Nevertheless, the characteristic microbiota typing for CRC has not yet been explored.
In this study, CRC typing based on gut bacteria was conducted via unsupervised clustering, aiming to explore potential molecular targets for CRC stratification and diagnosis from the perspective of gut microbes.Moreover, the correlation between gut bacteria and clinical information was analyzed.Our findings are expected to provide potential biomarkers for the identification of CRC and point out a new direction for clinical diagnosis of the disease.

| Volunteer recruitment and data acquisition
A total of 914 volunteers were recruited from the Huzhou Central Hospital from January 1, 2020, to December 31, 2022, including 376 patients with CRC, 363 patients with advanced adenoma (AA), and 175 normal controls, and their stool samples were collected.Both CRC and AA patients included in this study were pathologically diagnosed.The exclusion criteria of CRC patients are as follows: (1) Other malignancies exist in the patient; (2) other serious cardiopulmonary diseases exist in patients; (3) a history of antibiotic, hormonal, or gut microbiota use in the 3 months prior to admission existed in patients; (4) other intestinal diseases including ulcerative colitis and Crohn's disease exist in patients; and (5) mental illness or cognitive impairment exists in patients.
The Huzhou Central Hospital ethics committee (no.: 20191101-01) and Chinese clinical trial registry (http:// www.chictr.org.cn, no.: ChiCTR2100050167) approved the plan involving the patients' clinical and informed consent.Besides, the corresponding clinical information (e.g., age and gender) and pathological characteristics were collected from the hospital information management system (Table 1).

| Data processing and clustering
16S rRNA sequencing was used to detect gut bacterial sequences of the stool samples.The processing of microbial sequencing data can refer to previous studies. 11nsupervised clustering was carried out by R software (Version:4.3.1). 24ConsensusClusterPlus package was employed to perform consensus clustering on gut bacterial metagenomics data.This clustering divided the samples into subtypes, facilitating subsequent comparative analyses among different subtypes.The specific parameters used included maxK = 9, reps = 50, pItem = 0.8, pFeature = 1, distance = "Euclidean," and clusterAlg = "km." 25The probably approximately correct (PAC) algorithm was used to select the best K value to optimize the clustering model.

| Data Analysis
Descriptive analysis, difference analysis, correlation analysis, and CatBoost model construction were performed on gut bacteriological data among different diseases and subtypes.
The genus-level data were visualized via the R and gg-plot2 packages for species composition among different subtypes, and cluster composition in different diseases and disease composition in different clusters were also presented.Besides, principal coordinate (PCoA) analysis and non-metric multidimensional scaling (NMDS) analysis for different diseases and subtypes were carried out, with specific steps referring to previous studies. 11PSS 27.0 statistical software was used to analyze clinical information for differences among different subtypes, and linear discriminant analysis and the Kruskal-Wallis tests were used to screen for differential bacteria in different diseases and subtypes.Additionally, clinical information with statistical differences (p < 0.05) was further analyzed by the Kruskal-Wallis tests among gut microbial subtypes.
Spearman correlation coefficients were calculated for different diseases and subtypes to describe the association of CRC-associated bacteria and gut microbial characteristics with diseases and subtypes.
Differential bacteria were included to construct CatBoost models for disease identification and the construction method can be referred to previous studies. 11uring model construction, the data were divided into 80% of training set to build the model and 20% of test set to verify the model.Subtype B has no validation set due to small sample size.
The flow chart of this study is shown in Figure S1.

| Gut microbial subtypes based on unsupervised learning algorithm
The unsupervised training of gut bacteria from three populations, namely CRC, AA, and NC, showed that the best effect was achieved with a cluster number of 9 (Figure 1A).The ratio of people with disease status (AA and CRC) to NC varied across clusters ( The composition of clusters and subtypes varied in different populations (Figure 1C,D).C1, C5, and C3 were the top three clusters in CRC and AA, while C1, C8, and C5 were the top three clusters in NC.Subtype A and D were the top two subtypes in CRC and NC, while subtype A and C were the top two subtypes in AA.Significant structural differences in gut bacteria at the genus level were found among the four subtypes (Figure 1E).For example, Escherichia-Shigella, Blautia, and Eubacterium_hallii_group were the top three bacteria in subtype A, while Bacteroides, Bifidobacterium, and Prevotella were the top three bacteria in subtype D. What's more, there were also differences in the correlation between gut bacteria and subtypes (Figure 1F).Escherichia-Shigella showed a higher correlation with subtype A, while Streptococcus showed a higher correlation with subtype B. Romboutsia, Blautia, etc. were more strongly associated with subtype C, while Bacteroides, Bifidobacterium, etc. were more strongly associated with subtype D. Additionally, important characteristic gut bacteria of four subtypes were shown (Figure 1G).Bacteroides were enriched in subtype D, and Blautia showed higher abundance in subtype C. Escherichia-Shigella were abundant in subtype A, and Streptococcus were enriched in subtype B.
Before subtype classification, the clustering effect was not satisfactory (Figure S2A,B).After subtypes were formed, the clustering effect was improved (Figure S2C-J).Subtype B had the best clustering effect (Figure S2D,H), followed by subtype D (Figure S2F,J).

| Clinical information relationship with gut microbial subtypes
Clinically available information, such as gender and age, varied among the four subtypes (Table 5).In addition to the differences in the number of patients with CRC among the four subtypes, there were statistically significant differences in platelet, alanine aminotransferase, aspartateaminotransferase, free fatty acids, and total cholesterol (p < 0.05).
To better interpret the relationship between clinical information and gut microbial subtype, differential gut bacteria that played an important role in both gut microbial subtype and important clinical information were screened.The top 20 differential gut bacteria among different subtypes were displayed in Figure 2A.For example, Streptococcus showed higher abundance in subtype B, Blautia were enriched in subtype C, and Bacteroides were more abundant in subtype D.

Subtype Cluster
In total, 5 (3.47%) gut bacteria were found to vary both in different subtypes and total cholesterol content (Figure 2B).Actinomyces and Rothia were significantly enriched in high total cholesterol group, while Holdemania, Eubacterium fissicatena group, etc. were enriched in normal total cholesterol group (Figure 2D).Ten (6.76%) common bacteria were found in different subtypes and free fatty acid content (Figure 2C).Furthermore, Eubacterium_ hallii_group, Ruminococcus, Dorea, etc. showed higher abundance in low free fatty acids group (Figure 2E).
Moreover, the common differential gut bacteria for platelet, alanine aminotransferase, aspartateaminotransferase are shown in Figure S3.

| CRC pathological characteristics relationship with gut microbial subtypes
Based on the high sensitivity of tumor markers in screening, the examination of tumor markers provides a reliable reference for clinical diagnosis.Differential expression of nine tumor markers, such as CA125 and CA153, among four subtypes was presented (Figure S4A).Notably, CA19-9 was overexpressed in subtype B (p = 0.0089).
Pathological information, including degree of differentiation and tumor depth, of CRC patients in each gut microbial subtype was analyzed (Figure S4B-I).For example, adenocarcinoma (94.11%), well-differentiated (68.18%), and lymph nodes without metastasis (68.18%) were relatively enriched in subtype B. The ulcerative type (54.55%) was the characteristic tumor shape of subtype D, and the tumor tended to occur in the colon (56.06%).

| CRC-associated bacteria among different gut microbial subtypes
This study identified CRC-associated bacteria, beneficial bacteria, and harmful bacteria via the literature review.The association between characteristic gut bacteria and different diseases and subtypes is expected to provide insight into potential links between disease states and gut microbial subtypes.Bacteria with significant correlations with different populations were shown (|r| > 0.14) in Figure 3A.The significant negative correlation was found between positively related bacteria (e.g., Burkholderia, Veillonellaceae) and AA.Bacilli (r = 0.38) and Lactobacillales (r = 0.38) were positively correlated with subtype B, and Solobacterium (r = −0.18),Erysipelotrichia (r = −0.20),etc. were negatively correlated with subtype D (Figure 3B).In addition, it was found that Fusobacterium, a representative positively related bacterium of CRC, was found to be significantly enriched in subtypes A and D.  For NC, AA, and CRC, there were differences in bacterial abundance among each subtype, but no statistical significance was discovered (Figure S5).Negatively related bacteria, including Betaproteobacteria (r = 0.28) and Burkholderiales (r = 0.28), were significantly positively correlated with NC, while Lactobacillaceae (r = −0.16)and Propionibacterium (r = −0.11)were negatively correlated.Burkholderiales (r = −0.30),Betaproteobacteria (r = −0.30),and Sphingomonas (r = −0.18)were found to have a significant negative correlation with AA (Figure 3C).The significant positive correlation existed between Bacteroidia and subtype D (r = 0.36), and between Rothia and subtype B (r = 0.27).Coriobacteriia (r = −0.25),Coriobacteriales (r = −0.24),etc. were negatively correlated with subtype D (Figure 3D).
There was a significant positive correlation between CRC-related bacteria (e.g., Bacteroides and Faecalibacterium) and NC.Blautia (r = 0.18) was positively correlated with AA, and Oscillibacter (r = −0.15)was negatively correlated with AA.The significant positive correlation was found between Lactobacillus and CRC (r = 0.17) (Figure 3E).Blautia (r = 0.59), Lachnospiraceae  were also found to be negatively correlated with subtype D (Figure 3F).In addition, intra-group correlations of beneficial and harmful bacteria were analyzed for different diseases and subtypes (Figure S6).

| Gut microbial characteristics among different microbial subtypes
As the main inhabitants of the gut, gut bacteria have a crucial impact on gut health and the overall situation of gut bacteria can be described by the following 15 characteristics.Considering the potential association between the gut microbial characteristic and gut bacterial subtype, the intra-group correlation among different subtypes and various characteristics was analyzed (Figure 4A).

| Number of species
The number of strains detected above the minimum threshold (at least one test sequence) revealed a notable positive correlation with subtype A, with a correlation coefficient of 0.15 in our results.

| Enterotype
According to the species and abundance of human intestinal flora, the intestinal microecology can be classified into three enterotypes, namely, Prevotella, Bacteroides, and Ruminococcus. 26Different degrees of positive correlation existed between enterotype and subtype A (r = 0.

| Firmicutes/Bacteroidetes
The Firmicutes/Bacteroidetes ratio is an indicator of diseases, including obesity and inflammatory bowel disease. 28The ratio was positively correlated with subtype B (r = 0.17) and C (r = 0.27), while D was negatively correlated with the ratio (r = −0.40).

| Number of beneficial bacteria
Gut bacteria are divided into beneficial and harmful bacteria based on physiological function.Subtype A, B, and C were positively correlated with number of beneficial bacteria, but there was no statistical significance (p > 0.05).

| Beneficial bacteria content
The dominance of beneficial bacteria inhibits the propagation of pathogenic bacteria to help maintain intestinal health, while the dominance of harmful bacteria may lead to diseases, such as diarrhea and CRC.Subtype D was found to have a positive correlation with beneficial bacteria content (r = 0.08).

| Number of potentially harmful bacteria
In a healthy gut, beneficial and harmful bacteria are in a dynamic balance.Under the induction of environmental factors, potentially harmful bacteria may break the balance and induce disease.The number of potentially harmful bacteria was positively correlated with subtype B (r = 0.11) and C (r = 0.08), while it was negatively correlated with subtype D (r = −0.13).
3.5.9| Potentially harmful bacteria content The increase in potentially harmful bacteria could fuel the increase in disease risk.Subtype A was positively correlated with the potentially harmful bacteria content (r = 0.09).

| Microbial balance score
The balance of microecology ensures the normal physiological functions of the host, including digestion and immunity.Subtype A was significantly positively correlated with the microbial balance score (r = 0.18).

| Microbial diversity index
A well-balanced and highly diverse gut flora contributes to a healthy gut.Subtype A was significantly positively correlated with the microbial diversity index (r = 0.19).

| Number of gram-positive bacteria
The results proved that the number of gram-positive bacteria was significantly positively correlated with subtype B (r = 0.11), while it was negatively correlated with subtype D (r = −0.13).
3.5.13| Gram-positive bacteria content Subtype B was found to be positively correlated with the gram-positive bacteria content (r = 0.35).

| Oxidative stress tolerance
Subtype C was positively correlated with the oxidative stress tolerance (r = 0.16).

| Biofilm formation
Biofilms play a key role in gut diseases, such as providing a protective environment for pathogens that evade host immunity.The results revealed that biofilm formation was positively correlated with subtype A (r = 0.17), but negatively correlated with subtype C (r = −0.22).
To fully understand the relationship between the gut microbial characteristics and gut microbial subtype, the intergroup correlation among gut bacterial subtypes and microbial characteristics was analyzed (Figure 4B).For instance, Firmicutes/Bacteroidetes had a higher correlation with subtype B.
Additionally, considering the potential association between gut bacteria and different colorectal diseases, differential bacteria were included for intra-group and intergroup correlation analysis for different diseases (Figure 4C,D).AA was negatively correlated with the number of beneficial bacteria (r = −0.14),potentially harmful bacteria content (r = −0.11),etc.; CRC was positively correlated with the number of potentially harmful bacteria (r = 0.18), number of gram-positive bacteria (r = 0.16), etc.; NC was positively correlated with microbial balance score (r = −0.16)and negatively correlated with Firmicutes/Bacteroidetes (r = 0.16); Firmicutes/Bacteroidetes showed a higher correlation with AA, while the number of species, the number of grampositive bacteria and microbial diversity index, etc. had a higher correlation with NC.

| Improved CRC identification performance based on gut microbial subtypes
The abundance difference in gut bacteria from 914 stool samples was analyzed by linear discriminant analysis and the Kruskal-Wallis test.For example, Lactobacillaceae, Bacilli, etc. were enriched in CRC, while Clostridia, Firmicutes, etc. were enriched in AA (Figure S7A).The abundance of Bacteroides was higher in NC, while the abundance of Lactobacillus was higher in CRC (Figure S7B).There were common differential bacteria both in diseases and subtypes (Figure S6).For instance, the same differential bacteria of subtype A, such as Faecalibacterium in NC, Lactobacillaceae in CRC, and Erysipelatoclostridiaceae in AA.
Based on the important differential bacteria screened by the Kruskal-Wallis test, CatBoost models were established according to different gut microbial subtypes for the identification of colorectal diseases, including AA and CRC.After the establishment of subtypes, the disease prediction ability of machine learning model was significantly improved.The sample size for subtype B was only 57 cases (25 CRC, 5 NC, and 27 AA), so there was no test set.The disease identification models of subtype C performed best, with the accuracy of 78.57% for CRC, 90.00% for AA, and 93.55% for CRC/AA (Figure S8, Table 6).

| DISCUSSION
In this study, unsupervised clustering was used for classification, and four gut microbial subtypes were finally obtained according to the proportion of CRC.Considering the close association between gut bacteria and CRC, the correlation among gut microbial subtypes, clinical information and CRC pathological characteristics was analyzed, and significant differences in the correlation among them were found.Moreover, the correlation between CRC-associated bacteria and gut microbial characteristics explored in previous studies with different diseases and different subtypes was analyzed.Finally, the predictive ability of disease identification models based on different gut microbial subtypes was significantly optimized, which showed the potential of gut microbial subtypes for the diagnosis and prevention of CRC.
Based on the dominant bacterial species in the human gut, the concept of "enterotype" proposed in 2011 has the same classification potential as blood type, which is currently a hot research topic. 26Inspired by that, gut microbial subtypes were proposed based on unsupervised clustering and disease (CRC+AA) proportion in this study for the first time.Unsupervised learning has been successfully practiced in analyzing disease subphenotypes and their corresponding clinical information, aiming at stratified diagnosis and treatment. 29In this study, the human gut microbiota were classified into nine clusters via unsupervised clustering.Among the nine gut subtypes, C2 showed the highest CRC+AA/NC ratio, followed by C5.Meanwhile, C2 and C5 were subtypes B and C, which were at high risk of CRC.The cluster with high disease proportion was more consistent with the gut microbiota profile of CRC.
According to the disease (CRC+AA) proportion, the clusters with relatively close disease proportions in the remaining clusters were merged, and human gut microbiota were finally divided into four subtypes.Gut microbial subtypes were not related to age, gender, and other factors (p > 0.05), which showed the similar stability as the "enterotype." 26The classification of the gut based on the distinctive features of gut bacteria derived from CRC has the potential to enhance population-specific, stratified diagnosis and treatment, thus offering valuable clinical guidance for the identification of CRC.Besides, the characteristic bacteria of each subtype were found, namely Bacteroides of subtype D, Blautia of subtype C, Escherichia-Shigella of subtype A, and Streptococcus of subtype B, respectively.Recently, the association between Bacteroides and CRC was strongly supported by the discovery through Mendelian randomization that 13 unclassified Bacteroides genera increased the risk of CRC by 2%-15%. 30Besides, Escherichia-Shigella that could promote inflammatory bowel disease (IBD) was highly enriched in subtype A. 31 The "driver-passenger" model of CRC indicated that IBD was an important predisposition factor for CRC mediated by "driver" bacteria. 32Therefore, it is necessary to pay attention to the relationship among IBD, gut bacteria, and CRC, which could provide directions for the diagnosis and prevention of CRC for subtype A. The correlation among gut microbial subtypes and gut microbial characteristics was analyzed, which indicated the interaction between gut microbiota and hosts.The expansion of facultative anaerobes and the decrease in obligate anaerobes can disrupt gut microbiota homeostasis and ultimately accelerate the occurrence and progression of chronic inflammation and CRC. 33In this study, the content of anaerobic bacteria was negatively correlated with subtype B, while facultative anaerobic bacteria contents were positively correlated with subtype B. For subtype B, with the highest proportion of CRC, the changed contents of facultative and anaerobic bacteria provide directions for the diagnosis and treatment of CRC.Subtype C was significantly positively correlated with Lachnospiraceae family, including Blautia, which is an important producer of short-chain fatty acids in the gut. 34Although a few members of the Lachnospiraceae family were harmful to the host, such as the pro-inflammatory effects of Blautia gnavus, a recent study revealed that Lachnospiraceae family bacteria inhibited the progression of CRC by promoting the immune monitoring function of CD8 + T cells collectively. 35,36ifferent subtypes of gut microbiota had different risks of developing CRC.The criteria to measure CRC risk could be complex, including the proportion of the disease population in each subtype, and the correlation with CRC-associated bacteria and gut microbial characteristics.First, the proportion of CRC and AA of gut microbial subtypes has a significant impact on the risk of CRC.Gut microbiota of CRC and AA have remarkable changes, which provides the promising biomarkers for CRC diagnosis. 37,38he accumulation of CRC and precancerous lesions AA patients makes the gut microbial characteristics of gut subtypes more similar to those of CRC.Moreover, the correlation between CRC-associated bacteria and each subtype reflects the CRC risk of gut subtypes from the perspective of gut microbes to some extent.Subtype that has a high positive correlation with positively related bacteria, a high negative correlation with negatively related bacteria, and a high correlation with related bacteria will be considered as a high CRC risk gut microbial subtype.For instance, Bacilli and Lactobacillales, which were identified as positive related bacteria for CRC, and it was found that they showed a significant positive correlation with subtype B, which indicated an increased risk of CRC for subtype B. 39,40 Erysipelotrichia, Erysipelotrichaceae, Solobacterium, etc., have been identified as positive related bacteria for CRC in the previous study, and it was found that they had a significant negative correlation with subtype D in this study, which suggested a relatively low CRC risk of subtype D. [40][41][42] Additionally, the association of gut microbial characteristics with different subtypes, such as the contents of facultative anaerobes and obligate anaerobes associated with the imbalance of gut microecology, can be an important reference to measure the risk of CRC in different subtypes.In general, subtype B could rank first for the risk of CRC, followed by subtype C, subtype A, and subtype D according to our results.
It was found that lipid metabolism is associated with gut microbial subtypes.High levels of cholesterol that are associated with westernized diet are regarded as a potential risk of increased CRC. 39Many patients with CRC have abnormal lipid metabolism, and this was associated with altered gut microbiota in previous studies, which provided an opportunity for cholesterol accumulation that may further exacerbate CRC progression. 43,44In this study, statistical differences of total cholesterol were found among the four gut microbial subtypes, and the highest content was found in subtype C whose CRC/NC ratio was relatively high.The results indicated that cholesterol may be related to the classification of gut types, thus making the gut microbial subtypes more convincing to a certain extent.Further research is required to focus on the links among cholesterol, CRC, and gut bacteria.It was found that Actinomyces have cholesterol-degrading capabilities and are highly enriched in moderately differentiated CRC patients with the potential as a biological target for the degree of CRC differentiation. 11,45Function prediction and analysis of Actinomyces illustrated that it can induce high expression of TLR2, TLR4, and NF-κB in young-onset CRC and reduce infiltration of CD8 + T cells in tumor microenvironment, which could provide a direction for the mechanism study of CRC induced by Actinomyces. 46There is substantial evidence that gut bacteria are involved in cholesterol metabolism, and abnormal cholesterol metabolism is a hallmark of CRC. 44,47,48n this study, Actinomycetes were enriched in the high cholesterol group at the genus level, but the clinical genotypes of Actinomyces were diverse, which suggested the necessity of exploring the important role of Actinomyces in cholesterol metabolism at a more precise taxonomic level in the future. 49owever, there are still some limitations in this study.First, as the largest number of inhabitants in the gut, enteroviruses were not included in the sequencing.In the future, enterovirus should be included in research to explore the interaction between enterovirus and bacteria in CRC, which could provide a comprehensive interpretation perspective of the involvement of gut microbes in the occurrence and development of CRC.Subsequently, the mechanism of characteristic differential gut bacteria in the progression of CRC needs to be further supplemented with animal experiments.Most healthy volunteers are not willing to take the initiative to undergo colonoscopy, so the number of NC in this study is relatively small.Therefore, in the future, the sample size will be further expanded, especially NC.Moreover, there is a lack of validation cohorts to demonstrate the reliability of gut microbial subtypes.Therefore, more people from different geographies need to be included to verify the suitability of gut microbial subtypes.By doing so, the gut microbial subtype is expected to be further popularized, which could accelerate its progress into clinical practice for the diagnosis of CRC.

| CONCLUSION
Gut microbial subtypes are not affected by most clinical information (such as age and gender) and are associated with the pathological characteristics of CRC.The correlation of CRC-associated bacteria among four subtypes was different, which reflects the bacteriological characteristics of each subtype.In addition, the differences in the correlation of gut microbial characteristics among four subtypes provide a direction for future research on the mechanism of microbe-host interaction in CRC.Most importantly, the gut microbial subtype via unsupervised clustering provides diagnostic biomarkers for CRC stratification.written informed consent.All methods were performed in accordance with the relevant guidelines and regulations in ethics approval and consent to participate.ORCID Shuwen Han https://orcid.org/0000-0001-6180-9565Jing Zhuang https://orcid.org/0000-0002-6910-8824Xi Yang https://orcid.org/0000-0001-6863-0066

F I G U R E 1
Gut microbial typing and their composition analysis.(A) Consensus cumulative distribution function (CDF).K represents the number of clusters, and different colors represent CDF curves with different K values.The higher the stability of CDF curve, the more reliable the clustering results corresponding to the K value.(B) Consensus matrix heat map.Cluster refers to the number of cluster subtypes, and subtype refers to the number of gut microbial subtypes.The clustering results for K = 9 were displayed.(C) Cluster composition of colorectal diseases.The proportion of 9 clusters in different colorectal disease populations including CRC, AA, NC was shown.(D) Subtype composition of colorectal diseases.The proportion of 4 gut microbial subtypes in different colorectal disease populations including CRC, AA, NC was shown.(E) Bacterial structure of gut microbial subtypes.The composition of the top 20 gut bacteria in the four gut microbial subtypes was shown.(F) Chord diagram of gut bacteria among different subtypes.Different colors correspond to gut microbial subtypes.The longer the string, the greater the correlation.(G) Characteristic gut bacteria of gut microbial subtypes.The abundance of Escherichia-Shigella, Streptococcus, Blautia, Bacteroides in four gut microbial subtypes was analyzed.T A B L E 2 Composition and ratio of clusters with a clustering number of 9.

T A B L E 4
Composition and ratio of subtypes.
Differences in the abundance of the common gut bacteria in specific clinical information groups.(A) Differential bacteria among four subtypes.The top 20 differential bacteria among four gut microbial subtypes were shown."***" means 0.0001 < p ≤ 0.001, with significant statistical difference.(B): Venn diagram for subtype and total cholesterol.The orange circle represents the differential bacteria among different gut microbial subtypes, the pink circle represents the differential bacteria between different total cholesterol levels, and the overlapping part of the circle represents the common differential bacteria.(C) Venn diagram for subtype and free fatty acid.The orange circle represents the differential bacteria among different gut microbial subtypes, the pink circle represents the differential bacteria among different free fatty acid levels, and the overlapping part of the circle represents the common differential bacteria.(D) Common differential bacteria in different subtypes and total cholesterol content.(E) Common differential bacteria in different subtypes and free fatty acid."*" 0.01 < p ≤ 0.05, "**" 0.001< p ≤ 0.01, and "***" 0.0001 < p ≤ 0.001.(r = 0.56), Clostridia (r = 0.45), Eubacteriales (r = 0.45), etc. were found to be positively correlated with subtype C.Moreover, Blautia (r = −0.36),Clostridia (r = −0.40),Eubacteriales (r = −0.40),Firmicutes (r = −0.43),etc.

F I G U R E 3 F I G U R E 4
22), B (r = 0.11), and C (r = 0.20), while there was negative correlation with subtype D (r = −0.55).Correlation of CRC-associated bacteria with different colorectal diseases and subtypes.(A) Correlation heat map of positively related bacteria for colorectal diseases.(B) Correlation heat map of positively related bacteria for gut microbial subtypes.(C) Correlation heat map of negatively related bacteria for colorectal diseases.(D) Correlation heat map of negatively related bacteria for gut microbial subtypes.(E) Correlation heat map of related bacteria for colorectal diseases.(F) Correlation heat map of related bacteria for gut microbial subtypes.Colorectal diseases include CRC, AA, NC, and gut microbial subtypes include subtype A, B, C, and D. The sign of the correlation coefficient r is independent of the magnitude of the correlation, with "+" representing a positive correlation and "-" representing a negative correlation.The darker the color, the greater the correlation."*" 0.01 < p ≤ 0.05, "**" 0.001 < p ≤ 0.01, "***" 0.0001< p ≤ 0.001.Correlation of gut microbial characteristics with diseases and subtypes and CRC prediction effect of CatBoost models.(A) Correlation heat map of gut microbial characteristics for gut microbial subtypes.(B) Chord diagram of gut microbial characteristics among different subtypes.(C) Correlation heat map of gut microbial characteristics for colorectal diseases.(D) Chord diagram of gut microbial characteristics among different diseases.Prediction targets of CatBoost models include CRC, AA, and CRC/AA, with different ranges of colorectal disease populations for the three targets.The accuracy, sensitivity, and specificity of the model are the comparison indexes before and after classification.

3. 5 . 4 |
Anaerobic bacteria content    Intestinal flora can be divided into predominant microflora and sub-dominant microflora according to their number.Most of the predominant microflora are obligatory anaerobic bacteria and play an important guiding role in the physiological and pathological functions of the whole flora.Subtype A (r = −0.15)and B (r = −0.30)were negatively correlated with anaerobic bacteria content, while subtype C (r = 0.29) and D (r = 0.15) were positively correlated.3.5.5 | Facultative anaerobe contentThe sub-dominant microflora are mainly aerobic bacteria or facultative anaerobic bacteria and have potential pathogenicity.Contrary to the results of anaerobic bacteria content, subtype C (r = −0.21)and D (r = −0.23)were negatively correlated with facultative anaerobe content, while subtype A (r = 0.14) and B (r = 0.31) were positively correlated.

Table 2
T A

B L E 1 Clinical information before gut typing. Group p-value CRC AA NC (n = 376) (n = 363) (n = 175)
Clinical information of different subtypes.
T A B L E 5 Disease identification models.
T A B L E 6